-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #3527] Modify the logic around the opportunity change tracking table to never delete records #3565
[Issue #3527] Modify the logic around the opportunity change tracking table to never delete records #3565
Conversation
api/src/db/models/task_models.py
Outdated
|
||
|
||
class JobTable(ApiSchemaTable, TimestampMixin): | ||
__tablename__ = "job" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's have a slightly longer name, what about task_log
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this? job_log
might also be good?
Also want to avoid naming the table XTable
… of https://github.com/HHS/simpler-grants-gov into mikehgrantsgov/3527-modify-load-opp-logic-never-delete
api/src/task/task.py
Outdated
except Exception: | ||
# Update job status to failed | ||
self.update_job(JobStatus.FAILED, metrics=self.metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want some tests to verify this works in a few scenarios to make sure the DB session works, two I can think of that might break here:
- An error occurs (unrelated to the DB), the DB session needs to be rolled back first
- An error occurs when doing a DB operation - I think the DB session does get rolled back, but it might be in an unusable state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chouinar I think the recent adds to test_task.py
cover the scenarios you mention here. If a db error is thrown in task
it will rollback the transaction (which I don't think SQLAlchemy does automatically?) and begin a new transaction so the FAILED
state can be stored for the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLAlchemy doesn't rollback automatically, but if it fails to get to the end of the transaction, it should at least fail to commit.
Co-authored-by: Michael Chouinard <[email protected]>
Co-authored-by: Michael Chouinard <[email protected]>
Co-authored-by: Michael Chouinard <[email protected]>
… of https://github.com/HHS/simpler-grants-gov into mikehgrantsgov/3527-modify-load-opp-logic-never-delete
… of https://github.com/HHS/simpler-grants-gov into mikehgrantsgov/3527-modify-load-opp-logic-never-delete
api/src/db/models/task_models.py
Outdated
|
||
|
||
class JobTable(ApiSchemaTable, TimestampMixin): | ||
__tablename__ = "job" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this? job_log
might also be good?
Also want to avoid naming the table XTable
… of https://github.com/HHS/simpler-grants-gov into mikehgrantsgov/3527-modify-load-opp-logic-never-delete
api/src/task/task.py
Outdated
except SQLAlchemyError: | ||
# Rollback and begin new transaction only for DB-specific errors | ||
self.db_session.rollback() | ||
self.db_session.begin() | ||
raise | ||
except Exception: | ||
# For non-DB errors, just raise without touching the transaction | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case would we want a non-DB error to still commit changes? If the task is failing, I'd rather we just rollback anyways and not commit. A partial success state seems more problematic, and if any such scenarios do exist, we should handle that on a particular task.
What about a pattern like this:
job_succeeded = True
try:
self.run_task()
... # timing stuff
except Exception:
self.db_session.rollback() # May or may not be necessary?
job_succeeded = False
raise
finally:
job_status = JobStatus.COMPLETED if job_succeeded else JobStatus.FAILED
with db_session.begin():
self.update_job(job_status, metrics=self.metrics)
Would this hit any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If run_task raises an exception, the timing stuff and metrics would not run. Not sure if that's a big issue, though. I think the rollback() is not necessary here since the task would handle its own tx stuff (right? we are not relying on this task class for tx management?), and the finally
block would start its new transaction with db_session.begin
.
Otherwise I don't see any other issues here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite right yet, but @chouinar if you have any ideas on this, maybe we can pair to make this implementation more elegant.
api/src/task/task.py
Outdated
except Exception: | ||
# Update job status to failed | ||
self.update_job(JobStatus.FAILED, metrics=self.metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLAlchemy doesn't rollback automatically, but if it fails to get to the end of the transaction, it should at least fail to commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two small things, sorry for the slow reviews on this PR
api/src/task/task.py
Outdated
logger.info("Starting %s", self.cls_name()) | ||
start = time.perf_counter() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this get moved after the run_task
command? Should be the first thing in the try block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah was moving things around testing something out. Moved it back.
self.db_session.execute( | ||
update(OpportunityChangeAudit) | ||
.where(OpportunityChangeAudit.opportunity_id.in_(processed_opportunity_ids)) | ||
.values(updated_at=get_now_us_eastern_datetime()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the DB, we use UTC for everything, I would make this call the UTC function, not the eastern one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the dateutil here now, thanks
Summary
Fixes #3527
Time to review: 30 mins
Changes proposed
Add a new table to track task progress
Rename queue table and add all opportunities to it. Add any new opportunities that come in (should basically always be 1:1)
For index job, load data based on
last_loaded_at
date rather than usinghas_update
field or deleting records as we go.Context for reviewers
Cascade deleting orphan relationship remains unchanged
JobTable is meant to be general purpose and can be used, possibly ended, by other tasks.
Additional information
See unit tests